AI-Powered YouTube Summarizer, QA Tool with RAG, LangChain, FAISS

Estimated time needed: 1 hour

In this project, you'll build a question-answering (QA) tool capable of extracting and summarizing information from YouTube videos. Leveraging LangChain and a large language model (LLM), the tool will answer specific questions based on a video's transcript. You'll work with components like video transcript loaders, text processors, embedding models, vector databases, and retrievers, while using Streamlit for a user-friendly interface.

Alt text

With the explosion of online video content, manually searching through lengthy footage is inefficient. This project automates that process, transforming dense transcripts into concise summaries and enabling precise video segment identification using Facebook AI Similarity Search (FAISS). By the end of the project, you'll have developed a powerful system that streamlines how we interact with multimedia data, making video content more accessible and insightful.

Setup

Setting up a virtual environment

First, let's create a virtual environment. A virtual environment allows you to manage dependencies for different projects separately, avoiding conflicts between package versions.

To open a terminal, go to the top menu and click Terminal > New Terminal.

Screenshot 2024-10-10 at 12.01.08 PM.png

In the terminal of your Cloud IDE, ensure that you are in the path /home/project, then run the following commands to create a Python virtual environment.

  1. 1
  2. 2
  3. 3
  1. pip install virtualenv
  2. virtualenv my_env # create a virtual environment named my_env
  3. source my_env/bin/activate # activate my_env

Installing prerequisite libraries

To ensure seamless execution of your scripts, and considering that certain functions within these scripts rely on external libraries, it's essential to install some prerequisite libraries before you begin. For this project, the key libraries you'll need are:

  • youtube-transcript-api for extracting transcripts from YouTube videos.
  • faiss-cpu for efficient similarity search.
  • langchain and langchain-community for text processing and language models.
  • ibm-watsonx-ai and langchain-ibm for integrating IBM Watson services.
  • streamlit for building the web application interface.

Here's how to install these packages (from your terminal):

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  1. # installing necessary packages in my_env
  2. pip install youtube-transcript-api==1.2.1
  3. pip install faiss-cpu==1.8.0
  4. pip install langchain==0.2.6 | tail -n 1
  5. pip install langchain-community==0.2.6 | tail -n 1
  6. pip install ibm-watsonx-ai==1.0.10 | tail -n 1
  7. pip install langchain_ibm==0.1.8 | tail -n 1
  8. pip install gradio==4.44.1 | tail -n 1

The environment is now ready to create the application.

Construct the YouTube bot

It's time to construct the YouTube bot!

Let's start off by creating a new Python file that will store your bot. Click on the button below to create a new Python file, and call it ytbot.py. If, for whatever reason, the button does not work, you can create the new file by clicking File > New Text File. Be sure to save the file as ytbot.py.

In the following sections, you will populate ytbot.py with your bot.

Import necessary libraries

Inside ytbot.py, import the following libraries from streamlit, youtube_transcript_api, ibm_watsonx_ai, langchain_ibm, langchain, and langchain_community. The imported classes are required for initializing models with the correct credentials, splitting text, initializing a vector store, loading YouTube transcripts, generating a question-answer retriever, and using Gradio.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  1. # Import necessary libraries for the YouTube bot
  2. import gradio as gr
  3. import re #For extracting video id
  4. from youtube_transcript_api import YouTubeTranscriptApi # For extracting transcripts from YouTube videos
  5. from langchain.text_splitter import RecursiveCharacterTextSplitter # For splitting text into manageable segments
  6. from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes # For specifying model types
  7. from ibm_watsonx_ai import APIClient, Credentials # For API client and credentials management
  8. from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams # For managing model parameters
  9. from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods # For defining decoding methods
  10. from langchain_ibm import WatsonxLLM, WatsonxEmbeddings # For interacting with IBM's LLM and embeddings
  11. from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs # For retrieving model specifications
  12. from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes # For specifying types of embeddings
  13. from langchain_community.vectorstores import FAISS # For efficient vector storage and similarity search
  14. from langchain.chains import LLMChain # For creating chains of operations with LLMs
  15. from langchain.prompts import PromptTemplate # For defining prompt templates

Extracting YouTube transcripts

Video ID extraction

YouTube video URLs typically follow this format: https://www.youtube.com/watch?v=VIDEO_ID

The VIDEO_ID is a unique 11-character string that identifies the video. To extract this ID, we'll use a regular expression that captures this 11-character string from the URL.

Define a function to extract the video ID from the provided YouTube URL.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  1. def get_video_id(url):
  2. # Regex pattern to match YouTube video URLs
  3. pattern = r'https:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]{11})'
  4. match = re.search(pattern, url)
  5. return match.group(1) if match else None

Usage details

The get_video_id() function is designed to extract the unique VIDEO_ID from a YouTube URL.

Function call

  1. 1
  2. 2
  3. 3
  1. url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  2. video_id = get_video_id(url)
  3. print(video_id) # Output: dQw4w9WgXcQ

Expected output

If the URL matches the YouTube format: The function returns the extracted 11-character VIDEO_ID (for example, dQw4w9WgXcQ).
If the URL does not match: The function returns None.

Fetching transcripts from YouTube

The YouTubeTranscriptApi allows us to retrieve transcripts (subtitles) for a given video. This function first extracts the video ID from the YouTube URL, then fetches the transcripts available for that video. Transcripts can be either automatically generated or manually provided by the video uploader.

Here's the function to fetch the transcript:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  1. def get_transcript(url):
  2. # Extracts the video ID from the URL
  3. video_id = get_video_id(url)
  4. # Create a YouTubeTranscriptApi() object
  5. ytt_api = YouTubeTranscriptApi()
  6. # Fetch the list of available transcripts for the given YouTube video
  7. transcripts = ytt_api.list(video_id)
  8. transcript = ""
  9. for t in transcripts:
  10. # Check if the transcript's language is English
  11. if t.language_code == 'en':
  12. if t.is_generated:
  13. # If no transcript has been set yet, use the auto-generated one
  14. if len(transcript) == 0:
  15. transcript = t.fetch()
  16. else:
  17. # If a manually created transcript is found, use it (overrides auto-generated)
  18. transcript = t.fetch()
  19. break # Prioritize the manually created transcript, exit the loop
  20. return transcript if transcript else None

Example usage

Following is an example of how to call get_transcript() using a sample YouTube video URL:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  1. # Sample YouTube URL
  2. url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  3. # Fetching the transcript
  4. transcript = get_transcript(url)
  5. # Output the fetched transcript
  6. print(transcript)

Example output format

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  1. [
  2. {
  3. "text": "We're no strangers to love.",
  4. "start": 0.0,
  5. "duration": 3.5
  6. },
  7. {
  8. "text": "You know the rules and so do I.",
  9. "start": 3.5,
  10. "duration": 4.0
  11. },
  12. {
  13. "text": "A full commitment's what I'm thinking of.",
  14. "start": 7.5,
  15. "duration": 4.0
  16. }
  17. ]

If no transcript is available or if the video ID is invalid, the function will return an empty result or an appropriate error, depending on the availability of the transcript for the provided video ID.

Processing the transcript

When we fetch the transcript, it often comes in a structured format. Each entry in the transcript is typically represented as a dictionary with the following structure:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  1. {
  2. "text": "Transcript text here",
  3. "start": 0.0,
  4. "duration": 3.0
  5. }
  • text: The spoken text from the video.
  • start: The time (in seconds) when the text starts in the video.
  • duration: The duration (in seconds) for which the text is displayed.

This function transforms the fetched transcript into a more readable format by extracting the text and its corresponding start time.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  1. def process(transcript):
  2. # Initialize an empty string to hold the formatted transcript
  3. txt = ""
  4. # Loop through each entry in the transcript
  5. for i in transcript:
  6. try:
  7. # Append the text and its start time to the output string
  8. txt += f"Text: {i.text} Start: {i.start}\n"
  9. except KeyError:
  10. # If there is an issue accessing 'text' or 'start', skip this entry
  11. pass
  12. # Return the processed transcript as a single string
  13. return txt

Example usage

Following is an example of how to call process() using a fetched transcript:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  1. # Sample transcript list
  2. transcript = [
  3. {
  4. "text": "We're no strangers to love.",
  5. "start": 0.0,
  6. "duration": 3.5
  7. },
  8. {
  9. "text": "You know the rules and so do I.",
  10. "start": 3.5,
  11. "duration": 4.0
  12. },
  13. {
  14. "text": "A full commitment's what I'm thinking of.",
  15. "start": 7.5,
  16. "duration": 4.0
  17. }
  18. ]
  19. # Processing the transcript
  20. formatted_transcript = process(transcript)
  21. # Output the processed transcript
  22. print(formatted_transcript)

Expected output

The process() function returns a formatted string that contains the text and start time for each entry in the transcript. The output format will look like this:

  1. 1
  2. 2
  3. 3
  1. Text: We're no strangers to love. Start: 0.0
  2. Text: You know the rules and so do I. Start: 3.5
  3. Text: A full commitment's what I'm thinking of. Start: 7.5

Chunking the transcript

The RecursiveCharacterTextSplitter from LangChain helps split long transcripts into smaller, more manageable chunks for easier processing. This function takes a processed transcript and breaks it down into specified chunk sizes, with some overlap between chunks to ensure context is preserved across segments. This is useful when handling large texts that need to be processed by models or other tools.

Here's the function to chunk the transcript:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  1. def chunk_transcript(processed_transcript, chunk_size=200, chunk_overlap=20):
  2. # Initialize the RecursiveCharacterTextSplitter with specified chunk size and overlap
  3. text_splitter = RecursiveCharacterTextSplitter(
  4. chunk_size=chunk_size,
  5. chunk_overlap=chunk_overlap
  6. )
  7. # Split the transcript into chunks
  8. chunks = text_splitter.split_text(processed_transcript)
  9. return chunks

Explanation

  • Chunking the transcript: This function splits a large transcript into smaller chunks using a specified chunk_size (default is 200 characters) and an optional chunk_overlap (default is 20 characters) to maintain context across the chunks.

  • RecursiveCharacterTextSplitter: Ensures the text is split intelligently, without breaking up sentences or paragraphs unnaturally, making it ideal for processing large documents or transcripts.

Example usage

Below is an example of how to call chunk_transcript() using a processed transcript:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  1. # Sample processed transcript string
  2. processed_transcript = """Text: We're no strangers to love. Start: 0.0
  3. Text: You know the rules and so do I. Start: 3.5
  4. Text: A full commitment's what I'm thinking of. Start: 7.5"""
  5. # Chunking the transcript
  6. chunks = chunk_transcript(processed_transcript)
  7. # Output the chunks
  8. print(chunks)

Expected output

The chunk_transcript() function returns a list of strings, each representing a chunk of the original processed transcript. For example:

  1. 1
  2. 2
  3. 3
  4. 4
  1. [
  2. "Text: We're no strangers to love. Start: 0.0\nText: You know the rules and so do I. Start: 3.5",
  3. "Text: You know the rules and so do I. Start: 3.5\nText: A full commitment's what I'm thinking of. Start: 7.5"
  4. ]

In this output, each chunk contains overlapping segments of the transcript to maintain context, which is useful when processing each chunk with models or tools that require shorter inputs.

Setting up a watsonx model

Credentials setup

Set up the necessary credentials to access IBM Watson services. This function initializes the required credentials, client, and project details for interacting with the watsonx model.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  1. def setup_credentials():
  2. # Define the model ID for the WatsonX model being used
  3. model_id = "meta-llama/llama-3-2-3b-instruct"
  4. # Set up the credentials by specifying the URL for IBM Watson services
  5. credentials = Credentials(url="https://us-south.ml.cloud.ibm.com")
  6. # Create an API client using the credentials
  7. client = APIClient(credentials)
  8. # Define the project ID associated with the WatsonX platform
  9. project_id = "skills-network"
  10. # Return the model ID, credentials, client, and project ID for later use
  11. return model_id, credentials, client, project_id

Defining parameters

Configure the parameters for the watsonx model. This function sets up various generation parameters, such as the decoding method and token limits, to customize the behavior of the model during text generation.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  1. def define_parameters():
  2. # Return a dictionary containing the parameters for the WatsonX model
  3. return {
  4. # Set the decoding method to GREEDY for generating text
  5. GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
  6. # Specify the maximum number of new tokens to generate
  7. GenParams.MAX_NEW_TOKENS: 900,
  8. }

Initializing the watsonx LLM

Instantiate the watsonx LLM for summarization and Q&A tasks. This function initializes the watsonx language model by providing the necessary model ID, credentials, project ID, and parameters for its configuration.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  1. def initialize_watsonx_llm(model_id, credentials, project_id, parameters):
  2. # Create and return an instance of the WatsonxLLM with the specified configuration
  3. return WatsonxLLM(
  4. model_id=model_id, # Set the model ID for the LLM
  5. url=credentials.get("url"), # Retrieve the service URL from credentials
  6. project_id=project_id, # Set the project ID for accessing resources
  7. params=parameters # Pass the parameters for model behavior
  8. )

Embedding and similarity search

This section covers the process of embedding transcript chunks and implementing similarity search using FAISS.

Embedding the transcript chunks

We use the IBM SLATE-30M (ENG) model to generate embeddings for the transcript chunks. This function initializes the embedding model, which converts the textual data into numerical vectors that can be utilized for various natural language processing tasks, such as similarity calculations, clustering, and machine learning model training.

Embeddings are dense vector representations of text, where similar pieces of text are mapped to nearby points in the vector space. This allows models to understand the semantic relationships between words and phrases, making embeddings crucial for tasks like information retrieval, clustering, and classification.

The SLATE-30M model is specifically designed for English language embeddings, providing high-quality representations that capture the semantic meaning of the input text.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  1. def setup_embedding_model(credentials, project_id):
  2. # Create and return an instance of WatsonxEmbeddings with the specified configuration
  3. return WatsonxEmbeddings(
  4. model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value, # Set the model ID for the SLATE-30M embedding model
  5. url=credentials["url"], # Retrieve the service URL from the provided credentials
  6. project_id=project_id # Set the project ID for accessing resources in the Watson environment
  7. )

Implementing FAISS for similarity search

FAISS (Facebook AI Similarity Search) is a library designed for efficient similarity search and clustering of dense vectors, enabling rapid retrieval of nearest neighbors in high-dimensional spaces. This function creates a FAISS index from a list of text chunks using the specified embedding model. By converting text chunks into embeddings and indexing them with FAISS, we can quickly find the most similar chunks based on cosine similarity or other distance metrics. This is particularly useful in applications such as information retrieval, recommendation systems, and natural language understanding.

The function takes two parameters:

  • chunks: A list containing the text chunks that have been processed and are ready for indexing. These chunks typically represent segments of text that need to be compared or searched.
  • embedding_model: The model used to generate embeddings for the text chunks. This model transforms the chunks into numerical vectors that represent their semantic meaning.
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  1. def create_faiss_index(chunks, embedding_model):
  2. """
  3. Create a FAISS index from text chunks using the specified embedding model.
  4. :param chunks: List of text chunks
  5. :param embedding_model: The embedding model to use
  6. :return: FAISS index
  7. """
  8. # Use the FAISS library to create an index from the provided text chunks
  9. return FAISS.from_texts(chunks, embedding_model)

Performing similarity search

In this section, we will search for specific queries within the embedded transcript using the FAISS index. The function takes a query and finds the top k most similar text chunks from the indexed embeddings. This is useful for retrieving relevant information based on user queries or for identifying related content within a larger dataset.

The function takes the following parameters:

  • faiss_index: The FAISS index created from the embedded transcript chunks. This index allows for efficient similarity searches based on vector representations.

  • query: The text input for which we want to find similar chunks. This could be a user question or a topic of interest.

  • k: An optional parameter that specifies the number of similar results to return (default is 3).

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  1. def perform_similarity_search(faiss_index, query, k=3):
  2. """
  3. Search for specific queries within the embedded transcript using the FAISS index.
  4. :param faiss_index: The FAISS index containing embedded text chunks
  5. :param query: The text input for the similarity search
  6. :param k: The number of similar results to return (default is 3)
  7. :return: List of similar results
  8. """
  9. # Perform the similarity search using the FAISS index
  10. results = faiss_index.similarity_search(query, k=k)
  11. return results

Summarizing the transcript

Define the prompt template

In this section, we define a prompt template for the language model to summarize a YouTube video transcript. The prompt serves as a structured instruction that guides the model in generating a coherent summary. The template includes placeholders for dynamic content, specifically the transcript of the video.

The function returns a PromptTemplate object configured to accept the transcript as an input variable. This ensures that when the prompt is utilized, it can seamlessly incorporate the specific transcript text that needs summarization. The summary generated by the model should focus on the key points, omitting any timestamps present in the original transcript.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  1. def create_summary_prompt():
  2. """
  3. Create a PromptTemplate for summarizing a YouTube video transcript.
  4. :return: PromptTemplate object
  5. """
  6. # Define the template for the summary prompt
  7. template = """
  8. <|begin_of_text|><|start_header_id|>system<|end_header_id|>
  9. You are an AI assistant tasked with summarizing YouTube video transcripts. Provide concise, informative summaries that capture the main points of the video content.
  10. Instructions:
  11. 1. Summarize the transcript in a single concise paragraph.
  12. 2. Ignore any timestamps in your summary.
  13. 3. Focus on the spoken content (Text) of the video.
  14. Note: In the transcript, "Text" refers to the spoken words in the video, and "start" indicates the timestamp when that part begins in the video.<|eot_id|><|start_header_id|>user<|end_header_id|>
  15. Please summarize the following YouTube video transcript:
  16. {transcript}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  17. """
  18. # Create the PromptTemplate object with the defined template
  19. prompt = PromptTemplate(
  20. input_variables=["transcript"],
  21. template=template
  22. )
  23. return prompt

Instantiate the LLMChain for summarization

In this section, we create an LLMChain for generating summaries from the provided transcript using the defined prompt template. An LLMChain is a construct that combines a language model with a specific prompt, facilitating the process of generating text outputs based on the input data.

The function accepts the following parameters:

  • llm: An instance of the language model that will be used for summarization. This model processes the input and generates the summary.

  • prompt: A PromptTemplate instance that contains the structured prompt for the model. It guides the LLM in generating a concise summary of the transcript.

  • verbose: A boolean parameter that determines whether to enable verbose output during the summary generation process (default is True).

This function returns an LLMChain instance, which can then be used to generate summaries efficiently.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  1. def create_summary_chain(llm, prompt, verbose=True):
  2. """
  3. Create an LLMChain for generating summaries.
  4. :param llm: Language model instance
  5. :param prompt: PromptTemplate instance
  6. :param verbose: Boolean to enable verbose output (default: True)
  7. :return: LLMChain instance
  8. """
  9. return LLMChain(llm=llm, prompt=prompt, verbose=verbose)

Retrieving relevant context and generating answers

Define the retrieval function

This function retrieves relevant context from the FAISS index based on the user's query. It leverages the similarity search capabilities of FAISS to find the most pertinent documents or chunks that relate to the input query.

Parameters:

  • query(str): The user's query string that specifies what information is being sought.
  • faiss_index(FAISS): The FAISS index containing the embedded documents, which allows for efficient similarity searches.
  • k(int, optional, default=3): The number of most relevant documents to retrieve.

Returns:

  • list: A list of the k most relevant documents (or document chunks) that match the query.

Function workflow:

  1. Takes the user's query as input.
  2. Uses the FAISS index to perform a similarity search based on the vector representations (embeddings) of the query and the documents in the index.
  3. Returns the k most similar documents or chunks to the query.
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  1. def retrieve(query, faiss_index, k=7):
  2. """
  3. Retrieve relevant context from the FAISS index based on the user's query.
  4. Parameters:
  5. query (str): The user's query string.
  6. faiss_index (FAISS): The FAISS index containing the embedded documents.
  7. k (int, optional): The number of most relevant documents to retrieve (default is 3).
  8. Returns:
  9. list: A list of the k most relevant documents (or document chunks).
  10. """
  11. relevant_context = faiss_index.similarity_search(query, k=k)
  12. return relevant_context

Creating the Q&A prompt template

This function structures the prompt for answering questions based on the video content. It is designed to guide the AI in providing accurate and detailed responses to user queries.

Returns:

  • PromptTemplate: A configured PromptTemplate object for Q&A tasks.

Function workflow:

  1. Defines a template string that instructs the AI on its role and the task it needs to perform.
  2. Creates a PromptTemplate object with the template and the required input variables.
  3. Returns the configured PromptTemplate.
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  1. from langchain import PromptTemplate
  2. def create_qa_prompt_template():
  3. """
  4. Create a PromptTemplate for question answering based on video content.
  5. Returns:
  6. PromptTemplate: A PromptTemplate object configured for Q&A tasks.
  7. """
  8. # Define the template string
  9. qa_template = """
  10. You are an expert assistant providing detailed answers based on the following video content.
  11. Relevant Video Context: {context}
  12. Based on the above context, please answer the following question:
  13. Question: {question}
  14. """
  15. # Create the PromptTemplate object
  16. prompt_template = PromptTemplate(
  17. input_variables=["context", "question"],
  18. template=qa_template
  19. )
  20. return prompt_template

Example usage

Following is an example of how to call create_qa_prompt_template() to create a prompt template for Q&A tasks:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  1. # Creating the Q&A prompt template
  2. qa_prompt_template = create_qa_prompt_template()
  3. # Example of how to use the prompt template with context and a question
  4. context = "This video explains the fundamentals of quantum physics."
  5. question = "What are the key principles discussed in the video?"
  6. # Generating the prompt
  7. generated_prompt = qa_prompt_template.format(context=context, question=question)
  8. # Output the generated prompt
  9. print(generated_prompt)

Expected output

The create_qa_prompt_template() function returns a PromptTemplate object, which can be used to format the context and question. The output from the format() method will look like this:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  1. You are an expert assistant providing detailed answers based on the following video content.
  2. Relevant Video Context: This video explains the fundamentals of quantum physics.
  3. Based on the above context, please answer the following question:
  4. Question: What are the key principles discussed in the video?

Setting up the Q&A LLMChain

This section demonstrates how to instantiate an LLMChain for generating answers to questions based on a given prompt template and language model.

Returns:

  • LLMChain: An instantiated LLMChain ready for question answering.

Function workflow:

  1. Takes a language model and a prompt template as inputs.
  2. Creates an LLMChain that combines the model and the prompt.
  3. Sets the verbosity of the chain.
  4. Returns the configured LLMChain object.
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  1. def create_qa_chain(llm, prompt_template, verbose=True):
  2. """
  3. Create an LLMChain for question answering.
  4. Args:
  5. llm: Language model instance
  6. The language model to use in the chain (e.g., WatsonxGranite).
  7. prompt_template: PromptTemplate
  8. The prompt template to use for structuring inputs to the language model.
  9. verbose: bool, optional (default=True)
  10. Whether to enable verbose output for the chain.
  11. Returns:
  12. LLMChain: An instantiated LLMChain ready for question answering.
  13. """
  14. return LLMChain(llm=llm, prompt=prompt_template, verbose=verbose)

Generating an answer

This section demonstrates how to retrieve relevant context from a FAISS index and generate an answer based on user input.

Returns:

  • str: The generated answer to the user's question.

Function workflow:

  1. Retrieves relevant context using the FAISS index based on the user's question.
  2. Uses the question-answering chain (LLMChain) to generate an answer based on the retrieved context and the user's question.
  3. Returns the generated answer.
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  1. def generate_answer(question, faiss_index, qa_chain, k=7):
  2. """
  3. Retrieve relevant context and generate an answer based on user input.
  4. Args:
  5. question: str
  6. The user's question.
  7. faiss_index: FAISS
  8. The FAISS index containing the embedded documents.
  9. qa_chain: LLMChain
  10. The question-answering chain (LLMChain) to use for generating answers.
  11. k: int, optional (default=3)
  12. The number of relevant documents to retrieve.
  13. Returns:
  14. str: The generated answer to the user's question.
  15. """
  16. # Retrieve relevant context
  17. relevant_context = retrieve(question, faiss_index, k=k)
  18. # Generate answer using the QA chain
  19. answer = qa_chain.predict(context=relevant_context, question=question)
  20. return answer

Summarizing a video

This function generates a summary of a video using the preprocessed transcript. It uses IBM Watson's services to create an effective summary, ensuring that if the transcript hasn't been fetched yet, it fetches it first.

Returns:

  • str: The generated summary of the video or a message indicating that no transcript is available.

Function workflow:

  1. Checks if the transcript needs to be fetched based on the processed_transcript global variable.
  2. If the transcript is not fetched, it retrieves the transcript from the given YouTube URL.
  3. Sets up IBM Watson credentials.
  4. Initializes the watsonx LLM for summarization.
  5. Creates a summary prompt and chain.
  6. Generates and returns the summary of the video.
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  1. # Initialize an empty string to store the processed transcript after fetching and preprocessing
  2. processed_transcript = ""
  3. def summarize_video(video_url):
  4. """
  5. Title: Summarize Video
  6. Description:
  7. This function generates a summary of the video using the preprocessed transcript.
  8. If the transcript hasn't been fetched yet, it fetches it first.
  9. Args:
  10. video_url (str): The URL of the YouTube video from which the transcript is to be fetched.
  11. Returns:
  12. str: The generated summary of the video or a message indicating that no transcript is available.
  13. """
  14. global fetched_transcript, processed_transcript
  15. if video_url:
  16. # Fetch and preprocess transcript
  17. fetched_transcript = get_transcript(video_url)
  18. processed_transcript = process(fetched_transcript)
  19. else:
  20. return "Please provide a valid YouTube URL."
  21. if processed_transcript:
  22. # Step 1: Set up IBM Watson credentials
  23. model_id, credentials, client, project_id = setup_credentials()
  24. # Step 2: Initialize WatsonX LLM for summarization
  25. llm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())
  26. # Step 3: Create the summary prompt and chain
  27. summary_prompt = create_summary_prompt()
  28. summary_chain = create_summary_chain(llm, summary_prompt)
  29. # Step 4: Generate the video summary
  30. summary = summary_chain.run({"transcript": processed_transcript})
  31. return summary
  32. else:
  33. return "No transcript available. Please fetch the transcript first."

Answering a user's question

This function retrieves relevant context from the FAISS index based on the user's query and generates an answer using the preprocessed transcript. It first checks if the transcript has been fetched; if not, it fetches and processes the transcript from the provided YouTube video URL.

If the transcript is available and a user question is provided, the function proceeds to chunk the transcript for better context retrieval. It then sets up IBM Watson credentials and initializes the watsonx LLM specifically for Q&A tasks.

Next, it creates a FAISS index using the chunked transcript and sets up the Q&A prompt template and chain. Finally, it generates an answer to the user's question using the FAISS index and returns the answer. If the transcript hasn't been fetched or if the user fails to provide a valid question, the function returns a relevant message indicating the issue.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  1. def answer_question(video_url, user_question):
  2. """
  3. Title: Answer User's Question
  4. Description:
  5. This function retrieves relevant context from the FAISS index based on the user’s query
  6. and generates an answer using the preprocessed transcript.
  7. If the transcript hasn't been fetched yet, it fetches it first.
  8. Args:
  9. video_url (str): The URL of the YouTube video from which the transcript is to be fetched.
  10. user_question (str): The question posed by the user regarding the video.
  11. Returns:
  12. str: The answer to the user's question or a message indicating that the transcript
  13. has not been fetched.
  14. """
  15. global fetched_transcript, processed_transcript
  16. # Check if the transcript needs to be fetched
  17. if not processed_transcript:
  18. if video_url:
  19. # Fetch and preprocess transcript
  20. fetched_transcript = get_transcript(video_url)
  21. processed_transcript = process(fetched_transcript)
  22. else:
  23. return "Please provide a valid YouTube URL."
  24. if processed_transcript and user_question:
  25. # Step 1: Chunk the transcript (only for Q&A)
  26. chunks = chunk_transcript(processed_transcript)
  27. # Step 2: Set up IBM Watson credentials
  28. model_id, credentials, client, project_id = setup_credentials()
  29. # Step 3: Initialize WatsonX LLM for Q&A
  30. llm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())
  31. # Step 4: Create FAISS index for transcript chunks (only needed for Q&A)
  32. embedding_model = setup_embedding_model(credentials, project_id)
  33. faiss_index = create_faiss_index(chunks, embedding_model)
  34. # Step 5: Set up the Q&A prompt and chain
  35. qa_prompt = create_qa_prompt_template()
  36. qa_chain = create_qa_chain(llm, qa_prompt)
  37. # Step 6: Generate the answer using FAISS index
  38. answer = generate_answer(user_question, faiss_index, qa_chain)
  39. return answer
  40. else:
  41. return "Please provide a valid question and ensure the transcript has been fetched."

Setting up a Gradio interface

This section describes the setup of a Gradio interface for interacting with a YouTube video, allowing users to fetch its transcript, summarize it, or ask questions based on the content of the video. The interface is built using Gradio's Blocks API, which facilitates the creation of interactive web applications with minimal code.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  1. with gr.Blocks() as interface:
  2. # Input field for YouTube URL
  3. video_url = gr.Textbox(label="YouTube Video URL", placeholder="Enter the YouTube Video URL")
  4. # Outputs for summary and answer
  5. summary_output = gr.Textbox(label="Video Summary", lines=5)
  6. question_input = gr.Textbox(label="Ask a Question About the Video", placeholder="Ask your question")
  7. answer_output = gr.Textbox(label="Answer to Your Question", lines=5)
  8. # Buttons for selecting functionalities after fetching transcript
  9. summarize_btn = gr.Button("Summarize Video")
  10. question_btn = gr.Button("Ask a Question")
  11. # Display status message for transcript fetch
  12. transcript_status = gr.Textbox(label="Transcript Status", interactive=False)
  13. # Set up button actions
  14. summarize_btn.click(summarize_video, inputs=video_url, outputs=summary_output)
  15. question_btn.click(answer_question, inputs=[video_url, question_input], outputs=answer_output)
  16. # Launch the app with specified server name and port
  17. interface.launch(server_name="0.0.0.0", server_port=7860)

Description

  • Input field: A textbox (video_url) is created for users to enter the URL of the YouTube video they want to analyze.

  • Output fields: Two additional textboxes, summary_output and answer_output, are used to display the generated summary and answers to user questions, respectively. A question_input textbox allows users to type their queries regarding the video content.

  • Buttons: Two buttons, summarize_btn and question_btn, are included to allow users to trigger the summarization of the video or to ask a specific question about it.

  • Transcript status: A textbox (transcript_status) displays feedback to the user regarding the status of the transcript fetching process, indicating whether it was successful or if there were issues (for example, invalid URL).

  • Button actions: The summarize_btn is linked to the summarize_video function, which takes the YouTube URL as input and returns the summary to summary_output. The question_btn is linked to the answer_question function, which takes both the YouTube URL and user question as inputs and returns the answer to answer_output.

  • Launch configuration: Finally, the interface.launch(server_name="0.0.0.0", server_port=7860) line starts the Gradio application, enabling users to access it using a web browser on their local server.

Complete code reference

In this section, you'll find the full, consolidated code for the application, which includes all code snippets provided in previous steps. Use this as a reference to ensure that your implementation is consistent with the complete code structure required for the application to function as intended.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
  126. 126
  127. 127
  128. 128
  129. 129
  130. 130
  131. 131
  132. 132
  133. 133
  134. 134
  135. 135
  136. 136
  137. 137
  138. 138
  139. 139
  140. 140
  141. 141
  142. 142
  143. 143
  144. 144
  145. 145
  146. 146
  147. 147
  148. 148
  149. 149
  150. 150
  151. 151
  152. 152
  153. 153
  154. 154
  155. 155
  156. 156
  157. 157
  158. 158
  159. 159
  160. 160
  161. 161
  162. 162
  163. 163
  164. 164
  165. 165
  166. 166
  167. 167
  168. 168
  169. 169
  170. 170
  171. 171
  172. 172
  173. 173
  174. 174
  175. 175
  176. 176
  177. 177
  178. 178
  179. 179
  180. 180
  181. 181
  182. 182
  183. 183
  184. 184
  185. 185
  186. 186
  187. 187
  188. 188
  189. 189
  190. 190
  191. 191
  192. 192
  193. 193
  194. 194
  195. 195
  196. 196
  197. 197
  198. 198
  199. 199
  200. 200
  201. 201
  202. 202
  203. 203
  204. 204
  205. 205
  206. 206
  207. 207
  208. 208
  209. 209
  210. 210
  211. 211
  212. 212
  213. 213
  214. 214
  215. 215
  216. 216
  217. 217
  218. 218
  219. 219
  220. 220
  221. 221
  222. 222
  223. 223
  224. 224
  225. 225
  226. 226
  227. 227
  228. 228
  229. 229
  230. 230
  231. 231
  232. 232
  233. 233
  234. 234
  235. 235
  236. 236
  237. 237
  238. 238
  239. 239
  240. 240
  241. 241
  242. 242
  243. 243
  244. 244
  245. 245
  246. 246
  247. 247
  248. 248
  249. 249
  250. 250
  251. 251
  252. 252
  253. 253
  254. 254
  255. 255
  256. 256
  257. 257
  258. 258
  259. 259
  260. 260
  261. 261
  262. 262
  263. 263
  264. 264
  265. 265
  266. 266
  267. 267
  268. 268
  269. 269
  270. 270
  271. 271
  272. 272
  273. 273
  274. 274
  275. 275
  276. 276
  277. 277
  278. 278
  279. 279
  280. 280
  281. 281
  282. 282
  283. 283
  284. 284
  285. 285
  286. 286
  287. 287
  288. 288
  289. 289
  290. 290
  291. 291
  292. 292
  293. 293
  294. 294
  295. 295
  296. 296
  297. 297
  298. 298
  299. 299
  300. 300
  301. 301
  302. 302
  303. 303
  304. 304
  305. 305
  306. 306
  307. 307
  308. 308
  309. 309
  310. 310
  311. 311
  312. 312
  313. 313
  314. 314
  315. 315
  316. 316
  317. 317
  318. 318
  319. 319
  320. 320
  321. 321
  322. 322
  323. 323
  324. 324
  325. 325
  326. 326
  327. 327
  328. 328
  329. 329
  330. 330
  331. 331
  332. 332
  333. 333
  334. 334
  335. 335
  336. 336
  337. 337
  338. 338
  339. 339
  340. 340
  341. 341
  342. 342
  343. 343
  344. 344
  345. 345
  346. 346
  347. 347
  348. 348
  349. 349
  350. 350
  351. 351
  352. 352
  353. 353
  354. 354
  355. 355
  356. 356
  357. 357
  358. 358
  359. 359
  360. 360
  361. 361
  362. 362
  363. 363
  364. 364
  365. 365
  366. 366
  367. 367
  368. 368
  369. 369
  370. 370
  371. 371
  372. 372
  373. 373
  374. 374
  375. 375
  376. 376
  377. 377
  378. 378
  379. 379
  380. 380
  381. 381
  382. 382
  383. 383
  384. 384
  385. 385
  386. 386
  387. 387
  388. 388
  389. 389
  390. 390
  391. 391
  392. 392
  393. 393
  394. 394
  395. 395
  396. 396
  397. 397
  398. 398
  399. 399
  400. 400
  401. 401
  402. 402
  403. 403
  404. 404
  405. 405
  406. 406
  407. 407
  408. 408
  409. 409
  410. 410
  411. 411
  412. 412
  413. 413
  1. # Import necessary libraries for the YouTube bot
  2. import gradio as gr
  3. import re #For extracting video id
  4. from youtube_transcript_api import YouTubeTranscriptApi # For extracting transcripts from YouTube videos
  5. from langchain.text_splitter import RecursiveCharacterTextSplitter # For splitting text into manageable segments
  6. from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes # For specifying model types
  7. from ibm_watsonx_ai import APIClient, Credentials # For API client and credentials management
  8. from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams # For managing model parameters
  9. from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods # For defining decoding methods
  10. from langchain_ibm import WatsonxLLM, WatsonxEmbeddings # For interacting with IBM's LLM and embeddings
  11. from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs # For retrieving model specifications
  12. from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes # For specifying types of embeddings
  13. from langchain_community.vectorstores import FAISS # For efficient vector storage and similarity search
  14. from langchain.chains import LLMChain # For creating chains of operations with LLMs
  15. from langchain.prompts import PromptTemplate # For defining prompt templates
  16. def get_video_id(url):
  17. # Regex pattern to match YouTube video URLs
  18. pattern = r'https:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]{11})'
  19. match = re.search(pattern, url)
  20. return match.group(1) if match else None
  21. def get_transcript(url):
  22. # Extracts the video ID from the URL
  23. video_id = get_video_id(url)
  24. # Create a YouTubeTranscriptApi() object
  25. ytt_api = YouTubeTranscriptApi()
  26. # Fetch the list of available transcripts for the given YouTube video
  27. transcripts = ytt_api.list(video_id)
  28. transcript = ""
  29. for t in transcripts:
  30. # Check if the transcript's language is English
  31. if t.language_code == 'en':
  32. if t.is_generated:
  33. # If no transcript has been set yet, use the auto-generated one
  34. if len(transcript) == 0:
  35. transcript = t.fetch()
  36. else:
  37. # If a manually created transcript is found, use it (overrides auto-generated)
  38. transcript = t.fetch()
  39. break # Prioritize the manually created transcript, exit the loop
  40. return transcript if transcript else None
  41. def process(transcript):
  42. # Initialize an empty string to hold the formatted transcript
  43. txt = ""
  44. # Loop through each entry in the transcript
  45. for i in transcript:
  46. try:
  47. # Append the text and its start time to the output string
  48. #txt += f"Text: {i['text']} Start: {i['start']}\n"
  49. txt += f"Text: {i.text} Start: {i.start}\n"
  50. except KeyError:
  51. # If there is an issue accessing 'text' or 'start', skip this entry
  52. pass
  53. # Return the processed transcript as a single string
  54. return txt
  55. def chunk_transcript(processed_transcript, chunk_size=200, chunk_overlap=20):
  56. # Initialize the RecursiveCharacterTextSplitter with specified chunk size and overlap
  57. text_splitter = RecursiveCharacterTextSplitter(
  58. chunk_size=chunk_size,
  59. chunk_overlap=chunk_overlap
  60. )
  61. # Split the transcript into chunks
  62. chunks = text_splitter.split_text(processed_transcript)
  63. return chunks
  64. def setup_credentials():
  65. # Define the model ID for the WatsonX model being used
  66. model_id = "meta-llama/llama-3-2-3b-instruct"
  67. # Set up the credentials by specifying the URL for IBM Watson services
  68. credentials = Credentials(url="https://us-south.ml.cloud.ibm.com")
  69. # Create an API client using the credentials
  70. client = APIClient(credentials)
  71. # Define the project ID associated with the WatsonX platform
  72. project_id = "skills-network"
  73. # Return the model ID, credentials, client, and project ID for later use
  74. return model_id, credentials, client, project_id
  75. def define_parameters():
  76. # Return a dictionary containing the parameters for the WatsonX model
  77. return {
  78. # Set the decoding method to GREEDY for generating text
  79. GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
  80. # Specify the maximum number of new tokens to generate
  81. GenParams.MAX_NEW_TOKENS: 900,
  82. }
  83. def initialize_watsonx_llm(model_id, credentials, project_id, parameters):
  84. # Create and return an instance of the WatsonxLLM with the specified configuration
  85. return WatsonxLLM(
  86. model_id=model_id, # Set the model ID for the LLM
  87. url=credentials.get("url"), # Retrieve the service URL from credentials
  88. project_id=project_id, # Set the project ID for accessing resources
  89. params=parameters # Pass the parameters for model behavior
  90. )
  91. def setup_embedding_model(credentials, project_id):
  92. # Create and return an instance of WatsonxEmbeddings with the specified configuration
  93. return WatsonxEmbeddings(
  94. model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value, # Set the model ID for the SLATE-30M embedding model
  95. url=credentials["url"], # Retrieve the service URL from the provided credentials
  96. project_id=project_id # Set the project ID for accessing resources in the Watson environment
  97. )
  98. def create_faiss_index(chunks, embedding_model):
  99. """
  100. Create a FAISS index from text chunks using the specified embedding model.
  101. :param chunks: List of text chunks
  102. :param embedding_model: The embedding model to use
  103. :return: FAISS index
  104. """
  105. # Use the FAISS library to create an index from the provided text chunks
  106. return FAISS.from_texts(chunks, embedding_model)
  107. def perform_similarity_search(faiss_index, query, k=3):
  108. """
  109. Search for specific queries within the embedded transcript using the FAISS index.
  110. :param faiss_index: The FAISS index containing embedded text chunks
  111. :param query: The text input for the similarity search
  112. :param k: The number of similar results to return (default is 3)
  113. :return: List of similar results
  114. """
  115. # Perform the similarity search using the FAISS index
  116. results = faiss_index.similarity_search(query, k=k)
  117. return results
  118. def create_summary_prompt():
  119. """
  120. Create a PromptTemplate for summarizing a YouTube video transcript.
  121. :return: PromptTemplate object
  122. """
  123. # Define the template for the summary prompt
  124. template = """
  125. <|begin_of_text|><|start_header_id|>system<|end_header_id|>
  126. You are an AI assistant tasked with summarizing YouTube video transcripts. Provide concise, informative summaries that capture the main points of the video content.
  127. Instructions:
  128. 1. Summarize the transcript in a single concise paragraph.
  129. 2. Ignore any timestamps in your summary.
  130. 3. Focus on the spoken content (Text) of the video.
  131. Note: In the transcript, "Text" refers to the spoken words in the video, and "start" indicates the timestamp when that part begins in the video.<|eot_id|><|start_header_id|>user<|end_header_id|>
  132. Please summarize the following YouTube video transcript:
  133. {transcript}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  134. """
  135. # Create the PromptTemplate object with the defined template
  136. prompt = PromptTemplate(
  137. input_variables=["transcript"],
  138. template=template
  139. )
  140. return prompt
  141. def create_summary_chain(llm, prompt, verbose=True):
  142. """
  143. Create an LLMChain for generating summaries.
  144. :param llm: Language model instance
  145. :param prompt: PromptTemplate instance
  146. :param verbose: Boolean to enable verbose output (default: True)
  147. :return: LLMChain instance
  148. """
  149. return LLMChain(llm=llm, prompt=prompt, verbose=verbose)
  150. def retrieve(query, faiss_index, k=7):
  151. """
  152. Retrieve relevant context from the FAISS index based on the user's query.
  153. Parameters:
  154. query (str): The user's query string.
  155. faiss_index (FAISS): The FAISS index containing the embedded documents.
  156. k (int, optional): The number of most relevant documents to retrieve (default is 3).
  157. Returns:
  158. list: A list of the k most relevant documents (or document chunks).
  159. """
  160. relevant_context = faiss_index.similarity_search(query, k=k)
  161. return relevant_context
  162. def create_qa_prompt_template():
  163. """
  164. Create a PromptTemplate for question answering based on video content.
  165. Returns:
  166. PromptTemplate: A PromptTemplate object configured for Q&A tasks.
  167. """
  168. # Define the template string
  169. qa_template = """
  170. <|begin_of_text|><|start_header_id|>system<|end_header_id|>
  171. You are an expert assistant providing detailed and accurate answers based on the following video content. Your responses should be:
  172. 1. Precise and free from repetition
  173. 2. Consistent with the information provided in the video
  174. 3. Well-organized and easy to understand
  175. 4. Focused on addressing the user's question directly
  176. If you encounter conflicting information in the video content, use your best judgment to provide the most likely correct answer based on context.
  177. Note: In the transcript, "Text" refers to the spoken words in the video, and "start" indicates the timestamp when that part begins in the video.<|eot_id|>
  178. <|start_header_id|>user<|end_header_id|>
  179. Relevant Video Context: {context}
  180. Based on the above context, please answer the following question:
  181. {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  182. """
  183. # Create the PromptTemplate object
  184. prompt_template = PromptTemplate(
  185. input_variables=["context", "question"],
  186. template=qa_template
  187. )
  188. return prompt_template
  189. def create_qa_chain(llm, prompt_template, verbose=True):
  190. """
  191. Create an LLMChain for question answering.
  192. Args:
  193. llm: Language model instance
  194. The language model to use in the chain (e.g., WatsonxGranite).
  195. prompt_template: PromptTemplate
  196. The prompt template to use for structuring inputs to the language model.
  197. verbose: bool, optional (default=True)
  198. Whether to enable verbose output for the chain.
  199. Returns:
  200. LLMChain: An instantiated LLMChain ready for question answering.
  201. """
  202. return LLMChain(llm=llm, prompt=prompt_template, verbose=verbose)
  203. def generate_answer(question, faiss_index, qa_chain, k=7):
  204. """
  205. Retrieve relevant context and generate an answer based on user input.
  206. Args:
  207. question: str
  208. The user's question.
  209. faiss_index: FAISS
  210. The FAISS index containing the embedded documents.
  211. qa_chain: LLMChain
  212. The question-answering chain (LLMChain) to use for generating answers.
  213. k: int, optional (default=3)
  214. The number of relevant documents to retrieve.
  215. Returns:
  216. str: The generated answer to the user's question.
  217. """
  218. # Retrieve relevant context
  219. relevant_context = retrieve(question, faiss_index, k=k)
  220. # Generate answer using the QA chain
  221. answer = qa_chain.predict(context=relevant_context, question=question)
  222. return answer
  223. # Initialize an empty string to store the processed transcript after fetching and preprocessing
  224. processed_transcript = ""
  225. def summarize_video(video_url):
  226. """
  227. Title: Summarize Video
  228. Description:
  229. This function generates a summary of the video using the preprocessed transcript.
  230. If the transcript hasn't been fetched yet, it fetches it first.
  231. Args:
  232. video_url (str): The URL of the YouTube video from which the transcript is to be fetched.
  233. Returns:
  234. str: The generated summary of the video or a message indicating that no transcript is available.
  235. """
  236. global fetched_transcript, processed_transcript
  237. if video_url:
  238. # Fetch and preprocess transcript
  239. fetched_transcript = get_transcript(video_url)
  240. processed_transcript = process(fetched_transcript)
  241. else:
  242. return "Please provide a valid YouTube URL."
  243. if processed_transcript:
  244. # Step 1: Set up IBM Watson credentials
  245. model_id, credentials, client, project_id = setup_credentials()
  246. # Step 2: Initialize WatsonX LLM for summarization
  247. llm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())
  248. # Step 3: Create the summary prompt and chain
  249. summary_prompt = create_summary_prompt()
  250. summary_chain = create_summary_chain(llm, summary_prompt)
  251. # Step 4: Generate the video summary
  252. summary = summary_chain.run({"transcript": processed_transcript})
  253. return summary
  254. else:
  255. return "No transcript available. Please fetch the transcript first."
  256. def answer_question(video_url, user_question):
  257. """
  258. Title: Answer User's Question
  259. Description:
  260. This function retrieves relevant context from the FAISS index based on the user’s query
  261. and generates an answer using the preprocessed transcript.
  262. If the transcript hasn't been fetched yet, it fetches it first.
  263. Args:
  264. video_url (str): The URL of the YouTube video from which the transcript is to be fetched.
  265. user_question (str): The question posed by the user regarding the video.
  266. Returns:
  267. str: The answer to the user's question or a message indicating that the transcript
  268. has not been fetched.
  269. """
  270. global fetched_transcript, processed_transcript
  271. # Check if the transcript needs to be fetched
  272. if not processed_transcript:
  273. if video_url:
  274. # Fetch and preprocess transcript
  275. fetched_transcript = get_transcript(video_url)
  276. processed_transcript = process(fetched_transcript)
  277. else:
  278. return "Please provide a valid YouTube URL."
  279. if processed_transcript and user_question:
  280. # Step 1: Chunk the transcript (only for Q&A)
  281. chunks = chunk_transcript(processed_transcript)
  282. # Step 2: Set up IBM Watson credentials
  283. model_id, credentials, client, project_id = setup_credentials()
  284. # Step 3: Initialize WatsonX LLM for Q&A
  285. llm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())
  286. # Step 4: Create FAISS index for transcript chunks (only needed for Q&A)
  287. embedding_model = setup_embedding_model(credentials, project_id)
  288. faiss_index = create_faiss_index(chunks, embedding_model)
  289. # Step 5: Set up the Q&A prompt and chain
  290. qa_prompt = create_qa_prompt_template()
  291. qa_chain = create_qa_chain(llm, qa_prompt)
  292. # Step 6: Generate the answer using FAISS index
  293. answer = generate_answer(user_question, faiss_index, qa_chain)
  294. return answer
  295. else:
  296. return "Please provide a valid question and ensure the transcript has been fetched."
  297. with gr.Blocks() as interface:
  298. gr.Markdown(
  299. "<h2 style='text-align: center;'>YouTube Video Summarizer and Q&A</h2>"
  300. )
  301. # Input field for YouTube URL
  302. video_url = gr.Textbox(label="YouTube Video URL", placeholder="Enter the YouTube Video URL")
  303. # Outputs for summary and answer
  304. summary_output = gr.Textbox(label="Video Summary", lines=5)
  305. question_input = gr.Textbox(label="Ask a Question About the Video", placeholder="Ask your question")
  306. answer_output = gr.Textbox(label="Answer to Your Question", lines=5)
  307. # Buttons for selecting functionalities after fetching transcript
  308. summarize_btn = gr.Button("Summarize Video")
  309. question_btn = gr.Button("Ask a Question")
  310. # Display status message for transcript fetch
  311. transcript_status = gr.Textbox(label="Transcript Status", interactive=False)
  312. # Set up button actions
  313. summarize_btn.click(summarize_video, inputs=video_url, outputs=summary_output)
  314. question_btn.click(answer_question, inputs=[video_url, question_input], outputs=answer_output)
  315. # Launch the app with specified server name and port
  316. interface.launch(server_name="0.0.0.0", server_port=7860)

Serve the application

To serve the application, paste the following into your Python terminal:

  1. 1
  1. python3.11 ytbot.py

If you cannot find an open Python terminal or the buttons on the above cell do not work, you can launch a terminal by going to Terminal > New Terminal. However, if you launch a new terminal, do not forget to source the virtual environment you created at the beginning of the tutorial before running the above line:

  1. 1
  1. source my_env/bin/activate # activate my_env

Launch the application

You are now ready to launch the served application! To launch, click the following button:

If the above button does not work, complete the following steps:

  1. Select the Skills Network extension.
  2. Click Launch Application.
  3. Insert the port number (in this case 7860, which is the server port we put in ytbot.py)
  4. Click Your Application to launch the application. Note: If the application does not work using Your Application, use the icon Open in new browser tab.

Alternatie application launch instruction image

Test the application

To test the application, you can use the YouTube video link https://www.youtube.com/watch?v=T-D1OfcDW1M. This video offers a high-level introduction to RAG from a trusted source and can help ground the LLM’s responses, reducing the likelihood of hallucinations.

Steps to generate the summary

  1. Input the video URL: Enter the following URL into the input field labeled "YouTube Video URL":
    https://www.youtube.com/watch?v=T-D1OfcDW1M
  2. Summarize the video: Click the Summarize Video button. The application will fetch the transcript and generate a summary based on the content of the video.
  3. View the summary: Once the summarization is complete, the generated summary will be displayed in the Video Summary text box.

Example questions

After summarizing the video, you can engage further by asking specific questions:

  1. Question: How does one reduce hallucinations?
    This question can’t be answered accurately without context, as the term ‘hallucination’ can refer either to a psychological condition in humans or to the generation of false or misleading outputs by large language models (LLMs). Fortunately, in this case, we have a video transcript that provides the necessary context. To confirm this, simply paste the question into the Ask a Question About the Video input field and click the Ask a Question button.

  2. Question: Which problems does RAG solve, according to the video?
    In this case we are asking for information that is specifically contained in the video. In order to obtain a context-aware response, paste the question into the Ask a Question About the Video input field and click the Ask a Question button.

Conclusion

In this lab, you explored the use of AI and NLP techniques to fetch, summarize, and ask questions about YouTube videos. You learned how to:

  • Fetch a video transcript and preprocess it.
  • Use AI models to generate concise summaries of the video's content.
  • Retrieve relevant information based on user questions, using advanced Q&A techniques.

You've made great progress, and if you missed anything, don't worry! You can always come back and do the lab again to reinforce your understanding.

Next steps

Now that you've gained hands-on experience with video summarization and Q&A, here are some ideas for further exploration:

  1. Try asking different questions: Experiment with asking new types of questions based on the video you already used. For instance, you can ask about specific timestamps, deeper insights on discussed topics, or further clarifications.

  2. Use a different video: Test the application with a new video. Simply input a different YouTube URL and see how well the summarizer and Q&A tool handle new content. This will help you assess the model's adaptability to different video topics and formats.

  3. Enhance the application: Consider adding new features such as sentiment analysis on the video transcript or enabling the tool to summarize videos in different languages.

Author(s)

Other Contributor(s)